307 research outputs found
Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning
In biomedical research, many different types of patient data can be
collected, such as various types of omics data and medical imaging modalities.
Applying multi-view learning to these different sources of information can
increase the accuracy of medical classification models compared with
single-view procedures. However, collecting biomedical data can be expensive
and/or burdening for patients, so that it is important to reduce the amount of
required data collection. It is therefore necessary to develop multi-view
learning methods which can accurately identify those views that are most
important for prediction. In recent years, several biomedical studies have used
an approach known as multi-view stacking (MVS), where a model is trained on
each view separately and the resulting predictions are combined through
stacking. In these studies, MVS has been shown to increase classification
accuracy. However, the MVS framework can also be used for selecting a subset of
important views. To study the view selection potential of MVS, we develop a
special case called stacked penalized logistic regression (StaPLR). Compared
with existing view-selection methods, StaPLR can make use of faster
optimization algorithms and is easily parallelized. We show that nonnegativity
constraints on the parameters of the function which combines the views play an
important role in preventing unimportant views from entering the model. We
investigate the performance of StaPLR through simulations, and consider two
real data examples. We compare the performance of StaPLR with an existing view
selection method called the group lasso and observe that, in terms of view
selection, StaPLR is often more conservative and has a consistently lower false
positive rate.Comment: 26 pages, 9 figures. Accepted manuscrip
Research Openness in Canadian Political Science: Toward an Inclusive and Differentiated Discussion
In this paper, we initiate a discussion within the Canadian political science community about research openness and its implications for our discipline. This discussion is important because the Tri-Agency has recently released guidelines on data management and because a number of political science journals, from several subfields, have signed the Journal Editorsâ Transparency Statement requiring data access and research transparency (DA-RT). As norms regarding research openness develop, an increasing number and range of journals and funding agencies may begin to implement DA-RT-type requirements. If Canadian political scientists wish to continue to participate in the global political science community, we must take careful note of and be proactive participants in the ongoing developments concerning research openness
The BradleyâTerry Regression Trunk approach for Modeling Preference Data with Small Trees
This paper introduces the Bradley-Terry regression trunk model, a novel probabilistic approach for the analysis of preference data expressed through paired comparison rankings. In some cases, it may be reasonable to assume that the preferences expressed by individuals depend on their characteristics. Within the framework of tree-based partitioning, we specify a tree-based model estimating the joint effects of subject-specific covariates over and above their main effects. We, therefore, combine a tree-based model and the log-linear Bradley-Terry model using the outcome of the comparisons as response variable. The proposed model provides a solution to discover interaction effects when no a-priori hypotheses are available. It produces a small tree, called trunk, that represents a fair compromise between a simple interpretation of the interaction effects and an easy to read partition of judges based on their characteristics and the preferences they have expressed. We present an application on a real dataset following two different approaches, and a simulation study to test the model's performance. Simulations showed that the quality of the model performance increases when the number of rankings and objects increases. In addition, the performance is considerably amplified when the judges' characteristics have a high impact on their choices
Continuous Sweep: an improved, binary quantifier
Quantification is a supervised machine learning task, focused on estimating
the class prevalence of a dataset rather than labeling its individual
observations. We introduce Continuous Sweep, a new parametric binary quantifier
inspired by the well-performing Median Sweep. Median Sweep is currently one of
the best binary quantifiers, but we have changed this quantifier on three
points, namely 1) using parametric class distributions instead of empirical
distributions, 2) optimizing decision boundaries instead of applying discrete
decision rules, and 3) calculating the mean instead of the median. We derive
analytic expressions for the bias and variance of Continuous Sweep under
general model assumptions. This is one of the first theoretical contributions
in the field of quantification learning. Moreover, these derivations enable us
to find the optimal decision boundaries. Finally, our simulation study shows
that Continuous Sweep outperforms Median Sweep in a wide range of situations
Analyzing hierarchical multi-view MRI data with StaPLR: An application to Alzheimer's disease classification
Multi-view data refers to a setting where features are divided into feature
sets, for example because they correspond to different sources. Stacked
penalized logistic regression (StaPLR) is a recently introduced method that can
be used for classification and automatically selecting the views that are most
important for prediction. We introduce an extension of this method to a setting
where the data has a hierarchical multi-view structure. We also introduce a new
view importance measure for StaPLR, which allows us to compare the importance
of views at any level of the hierarchy. We apply our extended StaPLR algorithm
to Alzheimer's disease classification where different MRI measures have been
calculated from three scan types: structural MRI, diffusion-weighted MRI, and
resting-state fMRI. StaPLR can identify which scan types and which derived MRI
measures are most important for classification, and it outperforms elastic net
regression in classification performance.Comment: 36 pages, 9 figures. Accepted manuscrip
The detection and modeling of direct effects in latent class analysis
Several approaches have been proposed for latent class modeling with external variables, including one-step, two-step and three-step estimators. However, very little is known yet about the performance of these approaches when direct effects of the external variable to the indicators of latent class membership are present. In the current article, we compare those approaches and investigate the consequences of not modeling these direct effects when present, as well as the power of residual and fir statistics to identify such effects. The results of the simulations show that not modeling direct effect can lead to severe parameter bias, especially with a weak measurement model. Both residual and fit statistics can be used to identify such effects, as long as the number and strength of these effects is low and the measurement model is sufficiently strong
Call-duration and triage decisions in out of hours cooperatives with and without the use of an expert system
<p>Abstract</p> <p>Background</p> <p>Cooperatives delivering out of hours care in the Netherlands are hesitant about the use of expert systems during triage. Apart from the extra costs, cooperatives are not sure that quality of triage is sufficiently enhanced by these systems and believe that call duration will be prolonged drastically. No figures about the influence of the use of an expert system during triage on call duration and triage decisions in out of hours care in the Netherlands are available.</p> <p>Methods</p> <p>Electronically registered data concerning call duration and triage decisions were collected in two cooperatives. One in Tilburg, a cooperative in a Southern city of the Netherlands using an expert system, and one in Groningen, a cooperative in a Northern city not using an expert system. Some other relevant information about the care process was collected additionally. Data about call duration was compared using an independent sample t-test. Data about call decisions was compared using Chi Square.</p> <p>Results</p> <p>The mean call time in the cooperative using the TAS expert system is 4.6 minutes, in the cooperative not using the expert system 3.9 minutes. A significant difference of 0.7 minutes (0.4 â 1.0, 95% CI) minutes. In the cooperative with an expert system a larger percentage of patients is handled by the assistant, patients are less often referred to a telephone consultation with the GP and are less likely to be offered a visit by the GP.</p> <p>A quick interpretation of the impact of the difference in triage decisions, show that these may be large enough to support the hypothesis that longer call duration is compensated for by less contacts with the GP (by telephone or face-to-face). There is no proof, however, that these differences are caused by the use of the triage system. The larger amount of calls handled by the assistant may be partly caused by the fact that the assistants in the cooperative with an expert system more often consult the GP during triage. And it is not likely that the larger amount of home visits in Groningen can be attributed to the absence of an expert system. The expert system only offers advice whether a GP should be seen, not in which way (by consultation in the office or by home visit).</p> <p>Conclusion</p> <p>The differences in call times between a cooperative using an expert system and a cooperative not using an expert system are small; 0.4 â 1.0 min. Differences in triage decisions were found, but it is not proven that these can be contributed to the use of an expert system.</p
The Internet addiction components model and personality: Establishing construct validity via a nomological network
There is growing concern over excessive and sometimes problematic Internet use. Drawing upon the framework of the components model of addiction (Griffiths, 2005), Internet addiction appears as behavioural addiction characterised by the following symptoms: salience, withdrawal, tolerance, mood modification, relapse and conflict. A number of factors have been associated with an increased risk for Internet addiction, including personality traits. The overall aim of this study was to establish the association between personality traits and the Internet addiction components model in order to develop a theoretical framework via a nomological network. Internet addiction and personality traits were assessed in two independent samples of 3,105 adolescents in the Netherlands and 2,257 university students in England. The results indicate that low agreeableness and high neuroticism/low emotional stability are associated the Internet addiction components factor in both samples. However, low conscientiousness and low resourcefulness predicted it in the adolescent sample only. The implications include the usage of the Internet addiction components model as parsimonious tool for the initial screening of potential clients in mental health institutes, and identifying populations at risk through their personality traits which may prove advantageous for the initiation of targeted preventions efforts
- âŠ